AAAI.2024 - Demonstration Track

Total: 30

#1 SemLa: A Visual Analysis System for Fine-Grained Text Classification [PDF] [Copy] [Kimi]

Authors: Munkhtulga Battogtokh ; Cosmin Davidescu ; Michael Luck ; Rita Borgo

Fine-grained text classification requires models to distinguish between many fine-grained classes that are hard to tell apart. However, despite the increased risk of models relying on confounding features and predictions being especially difficult to interpret in this context, existing work on the interpretability of fine-grained text classification is severely limited. Therefore, we introduce our visual analysis system, SemLa, which incorporates novel visualization techniques that are tailored to this challenge. Our evaluation based on case studies and expert feedback shows that SemLa can be a powerful tool for identifying model weaknesses, making decisions about data annotation, and understanding the root cause of errors.

#2 Interactive Plan Selection Using Linear Temporal Logic, Disjunctive Action Landmarks, and Natural Language Instruction [PDF] [Copy] [Kimi]

Authors: Tathagata Chakraborti ; Jungkoo Kang ; Francesco Fuggitti ; Michael Katz ; Shirin Sohrabi

We present Lemming – a visualization tool for the interactive selection of plans for a given problem, allowing the user to efficiently whittle down the set of plans and select their plan(s) of choice. We demonstrate four different user experiences for this process, three of them based on the principle of using disjunctive action landmarks as guidance to cut down the set of choice points for the user, and one on the use of linear temporal logic (LTL) to impart additional constraints into the plan set using natural language (NL) instruction.

#3 SOCIALGYM 2.0: Simulator for Multi-Robot Learning and Navigation in Shared Human Spaces [PDF] [Copy] [Kimi]

Authors: Rohan Chandra ; Zayne Sprague ; Joydeep Biswas

We present Social Gym 2.0, a simulator for multi-agent navigation research. Our simulator enables navigation for multiple autonomous agents, replicating real-world dynamics in complex indoor environments, including doorways, hallways, intersections, and roundabouts. Unlike current simulators that concentrate on single robots in open spaces, Social Gym 2.0 employs multi-agent reinforcement learning (MARL) to develop optimal navigation policies for multiple robots with diverse, dynamic constraints in complex environments. Social Gym 2.0 also departs from the accepted software design standards by employing a configuration-over-convention paradigm providing the capability to benchmark different MARL algorithms, as well as customize observation and reward functions. Users can additionally create their own environments and evaluate various algorithms, based on both deep reinforcement learning as well as classical navigation, using a broad range of social navigation metrics.

#4 Enhancing Machine Translation Experiences with Multilingual Knowledge Graphs [PDF] [Copy] [Kimi]

Authors: Simone Conia ; Daniel Lee ; Min Li ; Umar Farooq Minhas ; Yunyao Li

Translating entity names, especially when a literal translation is not correct, poses a significant challenge. Although Machine Translation (MT) systems have achieved impressive results, they still struggle to translate cultural nuances and language-specific context. In this work, we show that the integration of multilingual knowledge graphs into MT systems can address this problem and bring two significant benefits: i) improving the translation of utterances that contain entities by leveraging their human-curated aliases from a multilingual knowledge graph, and, ii) increasing the interpretability of the translation process by providing the user with information from the knowledge graph.

#5 From Static to Dynamic: Knowledge Metabolism for Large Language Models [PDF] [Copy] [Kimi]

Authors: Mingzhe Du ; Anh Tuan Luu ; Bin Ji ; See-Kiong Ng

The immense parameter space of Large Language Models (LLMs) endows them with superior knowledge retention capabilities, allowing them to excel in a variety of natural language processing tasks. However, it also instigates difficulties in consistently tuning LMs to incorporate the most recent knowledge, which may further lead LMs to produce inaccurate and fabricated content. To alleviate this issue, we propose a knowledge metabolism framework for LLMs. This framework proactively sustains the credibility of knowledge through an auxiliary external memory component and directly delivers pertinent knowledge for LM inference, thereby suppressing hallucinations caused by obsolete internal knowledge during the LM inference process. Benchmark experiments demonstrate DynaMind's effectiveness in overcoming this challenge. The code and demo of DynaMind are available at: https://github.com/Elfsong/DynaMind.

#6 MANDREL: Modular Reinforcement Learning Pipelines for Material Discovery [PDF] [Copy] [Kimi]

Authors: Clyde Fare ; George K. Holt ; Lamogha Chiazor ; Michail Smyrnakis ; Robert Tracey ; Lan Hoang

AI-driven materials discovery is evolving rapidly with new approaches and pipelines for experimentation and design. However, the pipelines are often designed in isolation. We introduce a modular reinforcement learning framework for inter-operable experimentation and design of tailored, novel molecular species. The framework unifies reinforcement learning (RL) pipelines and allows the mixing and matching of choices for the underlying chemical action space, molecular representation, desired molecular properties, and RL algorithm. Our demo showcases the framework's capabilities applied to benchmark problems like quantitative estimate of drug-likeness and PLogP, as well as the design of novel small molecule solvents for carbon capture.

#7 LLMGuard: Guarding against Unsafe LLM Behavior [PDF] [Copy] [Kimi]

Authors: Shubh Goyal ; Medha Hira ; Shubham Mishra ; Sukriti Goyal ; Arnav Goel ; Niharika Dadu ; Kirushikesh DB ; Sameep Mehta ; Nishtha Madaan

Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

#8 Interactive Visual Task Learning for Robots [PDF] [Copy] [Kimi]

Authors: Weiwei Gu ; Anant Sah ; Nakul Gopalan

We present a demonstrable framework for robots to learn novel visual concepts and visual tasks via in-situ linguistic interactions with human users. Previous approaches in computer vision have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along with their attributes and representations to a concept hierarchy. We extend the approaches that focus on learning visual concept hierarchies and take this ability one step further to demonstrate novel task solving on robots along with the learned visual concepts. To enable a visual concept learner to solve robotics tasks one-shot, we developed two distinct techniques. Firstly, we propose a novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which augments information of a novel concept, that is being taught, to its parent nodes within a concept hierarchy. This information propagation allows all concepts in a hierarchy to update as novel concepts are taught in a continual learning setting. Secondly, we represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ. Combining the two techniques, we present a demonstration on a real robot that learns visual task and concepts in one-shot from in-situ interactions with human users, and generalize to perform a novel visual task of the same type in zero-shot. As shown by the studies in the main conference paper, our system achieves a success rate of 50% on solving the whole task correctly with generalization where the baseline performs at 14% without any ability to generalize to novel tasks and concepts. We will demonstrate our working interactive learning pipeline at AAAI 2024 in person with our robot and other required hardware.

#9 Fast & Fair: A Collaborative Platform for Fair Division Applications [PDF] [Copy] [Kimi]

Authors: Jiatong Han ; Warut Suksompong

Fair division, the study of how to fairly allocate resources among agents, has received substantial interest in the areas of artificial intelligence and multiagent systems. While there is an extensive theoretical literature on fair division by now, the developed algorithms are still mostly confined to research papers and inaccessible to the public. We attempt to bridge this gap by developing Fast & Fair, an open-source web application that hosts a number of fair allocation algorithms with user-friendly interfaces and explainable outcomes. In contrast to existing implementations, Fast & Fair is a collaborative platform that is open to community contributions and thereby facilitates the deployment of additional algorithms.

#10 Tools Identification By On-Board Adaptation of Vision-and-Language Models [PDF] [Copy] [Kimi]

Authors: Jun Hu ; Phil Miller ; Michael Lomnitz ; Saurabh Farkya ; Emre Yilmaz ; Aswin Raghavan ; David Zhang ; Michael Piacentino

A robotic workshop assistant has been a long-standing grand challenge for robotics, speech, computer vision, and artificial intelligence (AI) research. We revisit the goal of visual identification of tools from human queries in the current era of Large Vision-and-Language models (like GPT-4). We find that current off-the-shelf models (that are trained on internet images) are unable to overcome the domain shift and unable to identify small, obscure tools in cluttered environments. Furthermore, these models are unable to match tools to their intended purpose or affordances. We present a novel system for online domain adaptation that can be run directly on a small on-board processor. The system uses Hyperdimensional Computing (HD), a fast and efficient neuromorphic method. We adapted CLIP to work with explicit ("I need the hammer") and implicit purpose-driven queries ("Drive these nails"), and even with depth images as input. This demo allows the user to try out various real tools and interact via free-form audio.

#11 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head [PDF1] [Copy] [Kimi]

Authors: Rongjie Huang ; Mingze Li ; Dongchao Yang ; Jiatong Shi ; Xuankai Chang ; Zhenhui Ye ; Yuning Wu ; Zhiqing Hong ; Jiawei Huang ; Jinglin Liu ; Yi Ren ; Yuexian Zou ; Zhou Zhao ; Shinji Watanabe

Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Despite the recent success, current LLMs are not capable of processing complex audio information or conducting spoken conversations (like Siri or Alexa). In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i.e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue. With an increasing demand to evaluate multi-modal LLMs of human intention understanding and cooperation with foundation models, we outline the principles and processes and test AudioGPT in terms of consistency, capability, and robustness. Experimental results demonstrate the capabilities of AudioGPT in solving 16 AI tasks with speech, music, sound, and talking head understanding and generation in multi-round dialogues, which empower humans to create rich and diverse audio content with unprecedented ease. Code can be found in https://github.com/AIGC-Audio/AudioGPT

#12 Knowledge-Powered Recommendation for an Improved Diet Water Footprint [PDF] [Copy] [Kimi]

Authors: Saurav Joshi ; Filip Ilievski ; Jay Pujara

According to WWF, 1.1 billion people lack access to water, and 2.7 billion experience water scarcity at least one month a year. By 2025, two-thirds of the world's population may be facing water shortages. This highlights the urgency of managing water usage efficiently, especially in water-intensive sectors like food. This paper proposes a recommendation engine, powered by knowledge graphs, aiming to facilitate sustainable and healthy food consumption. The engine recommends ingredient substitutes in user recipes that improve nutritional value and reduce environmental impact, particularly water footprint. The system architecture includes source identification, information extraction, schema alignment, knowledge graph construction, and user interface development. The research offers a promising tool for promoting healthier eating habits and contributing to water conservation efforts.

#13 Reading between the Lines: Image-Based Order Detection in OCR for Chinese Historical Documents [PDF] [Copy] [Kimi]

Authors: Hsing-Yuan Ma ; Hen-Hsen Huang ; Chao-Lin Liu

Chinese historical documents, with their unique layouts and reading patterns, pose significant challenges for traditional Optical Character Recognition (OCR) systems. This paper introduces a tailored OCR system designed to address these complexities, particularly emphasizing the crucial aspect of Reading Order Detection(ROD). Our system operates through a threefold process: text detection using the Differential Binarization++ model, text recognition with the SVTR Net, and a novel ROD approach harnessing raw image features. This innovative method for ROD, inspired by human perception, utilizes visual cues present in raw images to deduce the inherent sequence of ancient texts. Preliminary results show promising reductions in page error rates. By preserving both content and context, our system contributes meaningfully to the accurate and contextual digitization of Chinese historical manuscripts.

#14 MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways [PDF] [Copy] [Kimi]

Authors: Mingyu Derek Ma ; Alexander K. Taylor ; Nuan Wen ; Yanchen Liu ; Po-Nien Kung ; Wenna Qin ; Shicheng Wen ; Azure Zhou ; Diyi Yang ; Xuezhe Ma ; Nanyun Peng ; Wei Wang

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users, we construct communities among users and develop the propagation forecasting capability, enabling tracing and understanding of how information is disseminated at a higher level. A demo video and more are available at https://info-pathways.github.io.

#15 ESG Accountability Made Easy: DocQA at Your Service [PDF1] [Copy] [Kimi1]

Authors: Lokesh Mishra ; Cesar Berrospi ; Kasper Dinkla ; Diego Antognini ; Francesco Fusco ; Benedikt Bothur ; Maksym Lysak ; Nikolaos Livathinos ; Ahmed Nassar ; Panagiotis Vagenas ; Lucas Morin ; Christoph Auer ; Michele Dolfi ; Peter Staar

We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via large language models). Users can explore over 10,000 Environmental, Social, and Governance (ESG) disclosure reports from over 2000 corporations. The Deep Search platform can be accessed at: https://ds4sd.github.io.

#16 CHICOT: A Developer-Assistance Toolkit for Code Search with High-Level Contextual Information [PDF] [Copy] [Kimi]

Authors: Terufumi Morishita ; Yuta Koreeda ; Atsuki Yamaguchi ; Gaku Morio ; Osamu Imaichi ; Yasuhiro Sogawa

We propose a source code search system named CHICOT (Code search with HIgh level COnText) to assist developers in reusing existing code. While previous studies have examined code search on the basis of code-level, fine-grained specifications such as functionality, logic, or implementation, CHICOT addresses a unique mission: code search with high-level contextual information, such as the purpose or domain of a developer's project. It achieves this feature by first extracting the context information from codebases and then considering this context during the search. It provides a VSCode plugin for daily coding assistance, and the built-in crawler ensures up-to-date code suggestions. The case study attests to the utility of CHICOT in real-world scenarios.

#17 Expressive and Flexible Simulation of Information Spread Strategies in Social Networks Using Planning [PDF] [Copy] [Kimi]

Authors: Bharath Muppasani ; Vignesh Narayanan ; Biplav Srivastava ; Michael N. Huhns

In the digital age, understanding the dynamics of information spread and opinion formation within networks is paramount. This research introduces an innovative framework that combines the principles of opinion dynamics with the strategic capabilities of Automated Planning. We have developed, to the best of our knowledge, the first-ever numeric PDDL tailored for opinion dynamics. Our tool empowers users to visualize intricate networks, simulate the evolution of opinions, and strategically influence that evolution to achieve specific outcomes. By harnessing Automated Planning techniques, our framework offers a nuanced approach to devise sequences of actions tailored to transition a network from its current opinion landscape to a desired state. This holistic approach provides insights into the intricate interplay of individual nodes within a network and paves the way for targeted interventions. Furthermore, the tool facilitates human-AI collaboration, enabling users to not only understand information spread but also devise practical strategies to mitigate potential harmful outcomes arising from it. Demo Video link - https://tinyurl.com/3k7bp99h

#18 GEAR-Up: Generative AI and External Knowledge-Based Retrieval: Upgrading Scholarly Article Searches for Systematic Reviews [PDF1] [Copy] [Kimi]

Authors: Kaushik Roy ; Vedant Khandelwal ; Valerie Vera ; Harshul Surana ; Heather Heckman ; Amit Sheth

This paper addresses the time-intensive nature of systematic reviews (SRs) and proposes a solution leveraging advancements in Generative AI (e.g., ChatGPT) and external knowledge augmentation (e.g., Retrieval-Augmented Generation). The proposed system, GEAR-Up, automates query development and translation in SRs, enhancing efficiency by enriching user queries with context from language models and knowledge graphs. Collaborating with librarians, qualitative evaluations demonstrate improved reproducibility and search strategy quality. Access the demo at https://youtu.be/zMdP56GJ9mU.

#19 SciSpace Copilot: Empowering Researchers through Intelligent Reading Assistance [PDF1] [Copy] [Kimi]

Authors: Trinita Roy ; Asheesh Kumar ; Daksh Raghuvanshi ; Siddhant Jain ; Goutham Vignesh ; Kartik Shinde ; Rohan Tondulkar

We introduce SciSpace Copilot, an AI research assistant that helps in understanding and reading research papers faster by providing a plethora of features. Answering questions from a document has recently become popular using the Retrieval Augmented Generation (RAG) approach. Our tool uses an advanced question-answering pipeline to get accurate answers and also provide exact citations for the same. We provide many more valuable features on scientific text, including generating explanations, generating summaries, adding notes and highlights, and finding related papers from our 200 million corpus. Our tool supports 100+ languages, making research more accessible across language barriers. Thousands of users use SciSpace Copilot on a daily basis by uploading their articles to understand research faster and better. Our tool can be accessed at this link: https://typeset.io.

#20 Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Soumyendu Sarkar ; Ashwin Ramesh Babu ; Sajad Mousavi ; Vineet Gundecha ; Avisek Naug ; Sahand Ghorbanpour

We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D). The framework focuses on identifying sensitive regions and inducing misclassifications with minimal distortions and various distortion types. The novel RL method outperforms state-of-the-art methods for all three applications, proving its efficiency. Our RL approach produces superior localization masks, enhancing interpretability for image classification and ECG analysis models. For applications such as ECG analysis, our platform highlights critical ECG segments for clinicians while ensuring resilience against prevalent distortions. This comprehensive tool aims to bolster both resilience with adversarial training and transparency across varied applications and data types.

#21 Sustainability of Data Center Digital Twins with Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Soumyendu Sarkar ; Avisek Naug ; Antonio Guillen ; Ricardo Luna ; Vineet Gundecha ; Ashwin Ramesh Babu ; Sajad Mousavi

The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl

#22 EmFORE: Learning Email Folder Classification Rules by Demonstration [PDF] [Copy] [Kimi]

Authors: Mukul Singh ; Gust Verbruggen ; José Cambronero ; Vu Le ; Sumit Gulwani

Tools that help with email folder management are limited, as users have to manually write rules to assign emails to folders. We present EMFORE, an iterative learning system that automatically learns and updates such rules from observations. EMFORE is fast enough to suggest and update rules in real time and suppresses mails with low confidence to reduce the number of false positives. EMFORE can use different rule grammars, and thus be adapted to different clients, without changing the user experience. Previous methods do not learn rules, require complete retraining or multiple new examples after making a mistake, and do not distinguish between inbox and other folders. EMFORE learns rules incrementally and can make the neutral decision of leaving emails in the inbox, making it an ideal candidate for integration in email clients.

#23 Interactive Human-Centric Bias Mitigation [PDF] [Copy] [Kimi]

Authors: Inge Vejsbjerg ; Elizabeth M. Daly ; Rahul Nair ; Svetoslav Nizhnichenkov

Bias mitigation algorithms differ in their definition of bias and how they go about achieving that objective. Bias mitigation algorithms impact different cohorts differently and allowing end users and data scientists to understand the impact of these differences in order to make informed choices is a relatively unexplored domain. This demonstration presents an interactive bias mitigation pipeline that allows users to understand the cohorts impacted by their algorithm choice and provide feedback in order to provide a bias mitigated pipeline that most aligns with their goals.

#24 Visual Language – Let the Product Say What You Want [PDF] [Copy] [Kimi]

Authors: Jiaying Wang ; Shuailing Hao ; Jing Shan ; Xiaoxu Song

Visual Language is a multitasking on-line system focusing on e-commerce, which involves in generating accurate product descriptions for sellers and providing convenient product retrieval service for customers. To achieve this goal, the system adopts image description technology and multi-modal retrieval technology. By utilizing cross-modal generation technique, we could help sellers on rapid uploading products and customers on rapid retrieval, which could improve the experience of both sellers and customers.

#25 The CoachAI Badminton Environment: Bridging the Gap between a Reinforcement Learning Environment and Real-World Badminton Games [PDF] [Copy] [Kimi]

Authors: Kuang-Da Wang ; Yu-Tse Chen ; Yu-Heng Lin ; Wei-Yao Wang ; Wen-Chih Peng

We present the CoachAI Badminton Environment, a reinforcement learning (RL) environment tailored for AI-driven sports analytics. In contrast to traditional environments using rule-based opponents or simplistic physics-based randomness, our environment integrates authentic opponent AIs and realistic randomness derived from real-world matches data to bridge the performance gap encountered in real-game deployments. This novel feature enables RL agents to seamlessly adapt to genuine scenarios. The CoachAI Badminton Environment empowers researchers to validate strategies in intricate real-world settings, offering: i) Realistic opponent simulation for RL training; ii) Visualizations for evaluation; and iii) Performance benchmarks for assessing agent capabilities. By bridging the RL environment with actual badminton games, our environment is able to advance the discovery of winning strategies for players. Our code is available at https://github.com/wywyWang/CoachAI-Projects/tree/main/Strategic%20Environment.